Reducing Downtime Due to System Maintenance and Upgrades

نویسنده

  • Shaya Potter
چکیده

Patching, upgrading, and maintaining operating system software is a growing management complexity problem that can result in unacceptable system downtime. We introduce AutoPod, a system that enables unscheduled operating system updates while preserving application service availability. AutoPod provides a group of processes and associated users with an isolated machineindependent virtualized environment that is decoupled from the underlying operating system instance. This virtualized environment is integrated with a novel checkpoint-restart mechanism which allows processes to be suspended, resumed, and migrated across operating system kernel versions with different security and maintenance patches. AutoPod incorporates a system status service to determine when operating system patches need to be applied to the current host, then automatically migrates application services to another host to preserve their availability while the current host is updated and rebooted. We have implemented AutoPod on Linux without requiring any application or operating system kernel changes. Our measurements on real world desktop and server applications demonstrate that AutoPod imposes little overhead and provides sub-second suspend and resume times that can be an order of magnitude faster than starting applications after a system reboot. AutoPod enables systems to autonomically stay updated with relevant maintenance and security patches, while ensuring no loss of data and minimizing service disruption.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing Downtime Due to System Maintenance and Upgrades (Awarded Best Student Paper!)

Patching, upgrading, and maintaining operating system software is a growing management complexity problem that can result in unacceptable system downtime. We introduce AutoPod, a system that enables unscheduled operating system updates while preserving application service availability. AutoPod provides a group of processes and associated users with an isolated machineindependent virtualized env...

متن کامل

MetaMorphMagi: From Offline to Online Software Upgrades in Large-Scale IT Infrastructures

Software upgrades are one of the leading causes of downtime in IT infrastructures. Long running datamigration processes require intensive up-front preparation, extended maintenance windows and close monitoring, and they impose a significant burden on the system administrators. Even worse, major upgrades sometimes fail due to complex, hidden dependencies within the system, causing unplanned down...

متن کامل

Why Do Upgrades Fail And What Can We Do About It? Toward Dependable, Online Upgrades in Enterprise System

Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading cause of upgrade failures. We propose a novel upgrade-centric fault model, based on data from three independent sources, which focuses on the impact of procedural errors rather than software defects. We show that current approach...

متن کامل

Why Do Upgrades Fail and What Can We Do about It? Toward Dependable, Online Upgrades in Enterprise Systems

Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading cause of upgrade failures. We propose a novel upgradecentric fault model, based on data from three independent sources, which focuses on the impact of procedural errors rather than software defects. We show that current approache...

متن کامل

An application of artificial neural network to maintenance management

This study shows the usefulness of Artificial Neural Network (ANN) in maintenance planning and man-agement. An ANN model based on the multi-layer perceptron having three hidden layers and four processing elements per layer was built to predict the expected downtime resulting from a breakdown or a maintenance activity. The model achieved an accuracy of over 70% in predicting the expected downtime.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005